Optimization of Multiple Sequence Alignment Software ClustalW
نویسندگان
چکیده
* Corresponding author. E-mail address: [email protected] ‡ Corresponding author. E-mail address: [email protected] † Corresponding author. E-mail address: [email protected] Abstract This activity with the project PRACE-2IP is aimed to investigate and improve the performance of multiple sequence alignment software ClustalW on the supercomputer BlueGene/Q, so-called JUQUEEN, for the case study of the influenza virus sequences. Porting, tuning, profiling, and scaling of this code has been accomplished in this aspect. A parallel I/O interface has been designed for effcient sequence dataset input, in which sub-groups' local masters take care of read operation and broadcast the dataset to their slaves. The optimal group size has been investigated and the effects of read buffer size on read performance has been experimented. The application to ClustalW software shows that the current implementation with parallel I/O provides considerably better performance than the original code in view of I/O segment, leading up to 6.8 times speed-up for inputting dataset in case of using 8192 JUQUEEN cores.
منابع مشابه
Optimization and Scaling of Multiple Sequence Alignment Software ClustalW on Intel Xeon Phi
This work is aimed to investigate and to improve the performance of multiple sequence alignment software ClustalW on the test platform EURORA at CINECA, for the case study of the influenza virus sequences. The objective is code optimization, porting, scaling and performance evaluation of parallel multiple sequence alignment software ClustalW for Intel Xeon Phi (the MIC architecture). For this p...
متن کاملScaling of Parallel Software for Biological Sequences Alignment and Homology Search on the Supercomputer BlueGene/P
The goal of this paper is to propose the performance evaluation of the scaling of parallel software for biological sequence alignment and homology searching based on blast algorithm for sequence searching and clustalw algorithm for multiple sequence alignment on the supercomputer BlueGene/P for the case study of influenza virus sequences variability and homology searching with human genome.
متن کاملQOMA: quasi-optimal multiple alignment of protein sequences
MOTIVATION We consider the problem of multiple alignment of protein sequences with the goal of achieving a large SP (Sum-of-Pairs) score. RESULTS We introduce a new graph-based method. We name our method QOMA (Quasi-Optimal Multiple Alignment). QOMA starts with an initial alignment. It represents this alignment using a K-partite graph. It then improves the SP score of the initial alignment th...
متن کاملBioTools: Tools based on Biostrings (alignment, classification, database)
Three are many stand-alone tools available for Bioinformatics. This package aims at using R and the Biostrings package as the common interface for several important tools for multiple sequence alignment (clustalw, kalign), classification (RDP), sequence retrieval (BLAST) as well as database driven sequence management for 16S rRNA.
متن کاملUsing Traveling Salesman Problem Algorithms to Determine Multiple Sequence Alignment Orders
Multiple Sequence Alignment (MSA) is one of the most important tools in modern biology. The MSA problem is NP-hard, therefore, heuristic approaches are needed to align a large set of data within a reasonable time. Among existing heuristic approaches, CLUSTALW has been found to be the progressive alignment program that provides the best quality alignments, while the program POA provides very fas...
متن کامل